-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add AWS S3 i/o #2175
Add AWS S3 i/o #2175
Conversation
pytorch_lightning/utilities/io.py
Outdated
|
||
|
||
def load_s3_checkpoint(checkpoint_path, map_location, **pickle_load_args): | ||
from torch.serialization import _legacy_load |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is _legacy_load
required here? I think we should aim for torch.load
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use boto3
to download an S3 file, then use torch.load
. Thoughts?
pytorch_lightning/utilities/io.py
Outdated
|
||
|
||
def save_s3_checkpoint(checkpoint, checkpoint_path): | ||
from torch.serialization import _legacy_save |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above. IMO we should aim for torch.save
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
torch.save
can only save locally. We could use boto3
to upload locally saved files. Thoughts?
@Borda this has failing tests |
Thoughts on |
I am fixing master here #2176
depends how long the legacy func will be available in PyTorch |
I'm thinking about how to add tests. One is to just save and load a file actually to an open-access S3 bucket. However, another way is to use the Thinking what the best solution is, any help appreciated. |
anything that has "legacy" in it's name is a big alarm sign for me hehe :) |
@awaelchli I agree that Shall we do this then?
The only complication with the above is the following: If we have model checkpointing configured to only save the best 5 models, then we'll end up saving many more to S3 because it will upload every time a new model entry occurs. So to get around this, we would also have to accordingly delete files on S3. Thoughts? |
Thoughts on deleting the cloud checkpoint if it's not in the top K models anymore? We would need to edit |
Alright moving everything over to boto3. Will make tests easier that way with moto. |
This seems related to #2164 which would support s3/gcs/hdfs. |
@f4hy Yeah it seems it. What shall we do? |
So I need support for writing all logs and such to a remote path, not just the checkpoints. And I need support for paths other than just s3. So that's why I set up #2164 to use I think its a mistake to do this in a way that just implements s3. Then we would need to implement the same thing again for gcs or hdfs or others. If the gfile solution is not acceptable I hope we can find some other lib that would do this for us. |
Hmmm, I agree @f4hy. Let's think of a unified way to do this long term. I feel however that because S3 is AWS, the cleanest solution to get S3 support would be to use @Borda @williamFalcon I've just redone the PR to now work using |
I mentioned this in #1532, but https://github.com/RaRe-Technologies/smart_open is really nice. It provides a file-like interface for all kinds of transports. I highly recommend using it for this PR! |
This pull request is now in conflict... :( |
How is this better than the same gfile interface provided by tensorboard. Again #2164 but we dont need to add a new dependency. |
@f4hy |
@@ -95,7 +96,7 @@ class ModelCheckpoint(Callback): | |||
|
|||
def __init__(self, filepath: Optional[str] = None, monitor: str = 'val_loss', verbose: bool = False, | |||
save_last: bool = False, save_top_k: int = 1, save_weights_only: bool = False, | |||
mode: str = 'auto', period: int = 1, prefix: str = ''): | |||
mode: str = 'auto', period: int = 1, prefix: str = '', remove_non_top_k_s3_files: bool = True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here I would keep it synced so I would drop remove_non_top_k_s3_files
This pull request is now in conflict... :( |
@Laksh1997 can we finish this one? :] |
cool, so let's close this one for now and reopen if needed :] |
Before submitting
What does this PR do?
Allows user to save and load checkpoints to S3 paths (e.g "s3://my-bucket/my-folder/my-checkpoint.ckpt")
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
Did you have fun?
Make sure you had fun coding 🙃